AITopics | precision level

Collaborating Authors

precision level

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

P$^2$U: Progressive Precision Update For Efficient Model Distribution

Afrabandpey, Homayun, Tavakoli, Hamed Rezazadegan

arXiv.org Artificial IntelligenceJul-1-2025

Efficient model distribution is becoming increasingly critical in bandwidth-constrained environments. In this paper, we propose a simple yet effective approach called Progressive Precision Update (P$^2$U) to address this problem. Instead of transmitting the original high-precision model, P$^2$U transmits a lower-bit precision model, coupled with a model update representing the difference between the original high-precision model and the transmitted low precision version. With extensive experiments on various model architectures, ranging from small models ($1 - 6$ million parameters) to a large model (more than $100$ million parameters) and using three different data sets, e.g., chest X-Ray, PASCAL-VOC, and CIFAR-100, we demonstrate that P$^2$U consistently achieves better tradeoff between accuracy, bandwidth usage and latency. Moreover, we show that when bandwidth or startup time is the priority, aggressive quantization (e.g., 4-bit) can be used without severely compromising performance. These results establish P$^2$U as an effective and practical solution for scalable and efficient model distribution in low-resource settings, including federated learning, edge computing, and IoT deployments. Given that P$^2$U complements existing compression techniques and can be implemented alongside any compression method, e.g., sparsification, quantization, pruning, etc., the potential for improvement is even greater.

accuracy, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.22871

Genre: Research Report (1.00)

Industry: Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

QSViT: A Methodology for Quantizing Spiking Vision Transformers

Putra, Rachmad Vidya Wicaksana, Iftikhar, Saad, Shafique, Muhammad

arXiv.org Artificial IntelligenceApr-1-2025

Vision Transformer (ViT)-based models have shown state-of-the-art performance (e.g., accuracy) in vision-based AI tasks. However, realizing their capability in resource-constrained embedded AI systems is challenging due to their inherent large memory footprints and complex computations, thereby incurring high power/energy consumption. Recently, Spiking Vision Transformer (SViT)-based models have emerged as alternate low-power ViT networks. However, their large memory footprints still hinder their applicability for resource-constrained embedded AI systems. Therefore, there is a need for a methodology to compress SViT models without degrading the accuracy significantly. To address this, we propose QSViT, a novel design methodology to compress the SViT models through a systematic quantization strategy across different network layers. To do this, our QSViT employs several key steps: (1) investigating the impact of different precision levels in different network layers, (2) identifying the appropriate base quantization settings for guiding bit precision reduction, (3) performing a guided quantization strategy based on the base settings to select the appropriate quantization setting, and (4) developing an efficient quantized network based on the selected quantization setting. The experimental results demonstrate that, our QSViT methodology achieves 22.75% memory saving and 21.33% power saving, while also maintaining high accuracy within 2.1% from that of the original non-quantized SViT model on the ImageNet dataset. These results highlight the potential of QSViT methodology to pave the way toward the efficient SViT deployments on resource-constrained embedded AI systems.

artificial intelligence, machine learning, quantization, (17 more...)

arXiv.org Artificial Intelligence

2504.00948

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)

Add feedback

RAG-based User Profiling for Precision Planning in Mixed-precision Over-the-Air Federated Learning

Yuan, Jinsheng, Tang, Yun, Guo, Weisi

arXiv.org Artificial IntelligenceMar-19-2025

Mixed-precision computing, a widely applied technique in AI, offers a larger trade-off space between accuracy and efficiency. The recent purposed Mixed-Precision Over-the-Air Federated Learning (MP-OTA-FL) enables clients to operate at appropriate precision levels based on their heterogeneous hardware, taking advantages of the larger trade-off space while covering the quantization overheads in the mixed-precision modulation scheme for the OTA aggregation process. A key to further exploring the potential of the MP-OTA-FL framework is the optimization of client precision levels. The choice of precision level hinges on multifaceted factors including hardware capability, potential client contribution, and user satisfaction, among which factors can be difficult to define or quantify. In this paper, we propose a RAG-based User Profiling for precision planning framework that integrates retrieval-augmented LLMs and dynamic client profiling to optimize satisfaction and contributions. This includes a hybrid interface for gathering device/user insights and an RAG database storing historical quantization decisions with feedback. Experiments show that our method boosts satisfaction, energy savings, and global model accuracy in MP-OTA-FL systems.

large language model, machine learning, precision level, (17 more...)

arXiv.org Artificial Intelligence

2503.15569

Country:

North America > United States (0.05)
Europe > United Kingdom (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.54)

Add feedback

Efficient Methods for Overlapping Group Lasso

Lei Yuan, Jun Liu, Jieping Ye

Neural Information Processing SystemsFeb-11-2025, 18:14:58 GMT

The group Lasso is an extension of the Lasso for feature selection on (predefined) non-overlapping groups of features. The non-overlapping group structure limits its applicability in practice. There have been several recent attempts to study a more general formulation, where groups of features are given, potentially with overlaps between the groups. The resulting optimization is, however, much more challenging to solve due to the group overlaps. In this paper, we consider the efficient optimization of the overlapping group Lasso penalized problem. We reveal several key properties of the proximal operator associated with the overlapping group Lasso, and compute the proximal operator by solving the smooth and convex dual problem, which allows the use of the gradient descent type of algorithms for the optimization. We have performed empirical evaluations using both synthetic and the breast cancer gene expression data set, which consists of 8,141 genes organized into (overlapping) gene sets. Experimental results show that the proposed algorithm is more efficient than existing state-of-the-art algorithms.

artificial intelligence, machine learning, proximal operator, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Arizona (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Neural Precision Polarization: Simplifying Neural Network Inference with Dual-Level Precision

Jayasuriya, Dinithi, Darabi, Nastaran, Hashem, Maeesha Binte, Trivedi, Amit Ranjan

arXiv.org Artificial IntelligenceNov-6-2024

We introduce a precision polarization scheme for DNN inference that utilizes only very low and very high precision levels, assigning low precision to the majority of network weights and activations while reserving high precision paths for targeted error compensation. This separation allows for distinct optimization of each precision level, thereby reducing memory and computation demands without compromising model accuracy. In the discussed approach, a floating-point model can be trained in the cloud and then downloaded to an edge device, where network weights and activations are directly quantized to meet the edge devices' desired level, such as NF4 or INT8. To address accuracy loss from quantization, surrogate paths are introduced, leveraging low-rank approximations on a layer-by-layer basis. These paths are trained with a sensitivity-based metric on minimal training data to recover accuracy loss under quantization as well as due to process variability, such as when the main prediction path is implemented using analog acceleration. Our simulation results show that neural precision polarization enables approximately 464 TOPS per Watt MAC efficiency and reliability by integrating rank-8 error recovery paths with highly efficient, though potentially unreliable, bit plane-wise compute-in-memory processing.

accuracy loss, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2411.05845

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Colorado (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

SNN4Agents: A Framework for Developing Energy-Efficient Embodied Spiking Neural Networks for Autonomous Agents

Putra, Rachmad Vidya Wicaksana, Marchisio, Alberto, Shafique, Muhammad

arXiv.org Artificial IntelligenceJun-18-2024

Recent trends have shown that autonomous agents, such as Autonomous Ground Vehicles (AGVs), Unmanned Aerial Vehicles (UAVs), and mobile robots, effectively improve human productivity in solving diverse tasks. However, since these agents are typically powered by portable batteries, they require extremely low power/energy consumption to operate in a long lifespan. To solve this challenge, neuromorphic computing has emerged as a promising solution, where bio-inspired Spiking Neural Networks (SNNs) use spikes from event-based cameras or data conversion pre-processing to perform sparse computations efficiently. However, the studies of SNN deployments for autonomous agents are still at an early stage. Hence, the optimization stages for enabling efficient embodied SNN deployments for autonomous agents have not been defined systematically. Toward this, we propose a novel framework called SNN4Agents that consists of a set of optimization techniques for designing energy-efficient embodied SNNs targeting autonomous agent applications. Our SNN4Agents employs weight quantization, timestep reduction, and attention window reduction to jointly improve the energy efficiency, reduce the memory footprint, optimize the processing latency, while maintaining high accuracy. In the evaluation, we investigate use cases of event-based car recognition, and explore the trade-offs among accuracy, latency, memory, and energy consumption. The experimental results show that our proposed framework can maintain high accuracy (i.e., 84.12% accuracy) with 68.75% memory saving, 3.58x speed-up, and 4.03x energy efficiency improvement as compared to the state-of-the-art work for NCARS dataset. In this manner, our SNN4Agents framework paves the way toward enabling energy-efficient embodied SNN deployments for autonomous agents.

accuracy, precision level, timestep, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.3389/frobt.2024.1401677

2404.09331

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Energy (0.56)
Information Technology > Robotics & Automation (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Mixed-Precision Over-The-Air Federated Learning via Approximated Computing

Yuan, Jinsheng, Wei, Zhuangkun, Guo, Weisi

arXiv.org Artificial IntelligenceJun-4-2024

Over-the-Air Federated Learning (OTA-FL) has been extensively investigated as a privacy-preserving distributed learning mechanism. Realistic systems will see FL clients with diverse size, weight, and power configurations. A critical research gap in existing OTA-FL research is the assumption of homogeneous client computational bit precision. Indeed, many clients may exploit approximate computing (AxC) where bit precisions are adjusted for energy and computational efficiency. The dynamic distribution of bit precision updates amongst FL clients poses an open challenge for OTA-FL, as is is incompatible in the wireless modulation superposition space. Here, we propose an AxC-based OTA-FL framework of clients with multiple precisions, demonstrating the following innovations: (i) optimize the quantization-performance trade-off for both server and clients within the constraints of varying edge computing capabilities and learning accuracy requirements, and (ii) develop heterogeneous gradient resolution OTA-FL modulation schemes to ensure compatibility with physical layer OTA aggregation. Our findings indicate that we can design modulation schemes that enable AxC based OTA-FL, which can achieve 50\% faster and smoother server convergence and a performance enhancement for the lowest precision clients compared to a homogeneous precision approach. This demonstrates the great potential of our AxC-based OTA-FL approach in heterogeneous edge computing environments.

federated learning, learning, precision, (13 more...)

arXiv.org Artificial Intelligence

2406.03402

Country:

Europe > United Kingdom (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.48)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Quantifying the Capabilities of LLMs across Scale and Precision

Badshah, Sher, Sajjad, Hassan

arXiv.org Artificial IntelligenceMay-7-2024

Scale is often attributed as one of the factors that cause an increase in the performance of LLMs, resulting in models with billion and trillion parameters. One of the limitations of such large models is the high computational requirements that limit their usage, deployment, and debugging in resource-constrained scenarios. Two commonly used alternatives to bypass these limitations are to use the smaller versions of LLMs (e.g. Llama 7B instead of Llama 70B) and lower the memory requirements by using quantization. While these approaches effectively address the limitation of resources, their impact on model performance needs thorough examination. In this study, we perform a comprehensive evaluation to investigate the effect of model scale and quantization on the performance. We experiment with two major families of open-source instruct models ranging from 7 billion to 70 billion parameters. Our extensive zero-shot experiments across various tasks including natural language understanding, reasoning, misinformation detection, and hallucination reveal that larger models generally outperform their smaller counterparts, suggesting that scale remains an important factor in enhancing performance. We found that larger models show exceptional resilience to precision reduction and can maintain high accuracy even at 4-bit quantization for numerous tasks and they serve as a better solution than using smaller models at high precision under similar memory requirements.

arxiv preprint arxiv, quantization, reasoning, (15 more...)

arXiv.org Artificial Intelligence

2405.03146

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada > Nova Scotia (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient Methods for Overlapping Group Lasso

Neural Information Processing SystemsMar-14-2024, 21:34:02 GMT

algorithm, lasso, proximal operator, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.04)
North America > United States > New York (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

DisGNet: A Distance Graph Neural Network for Forward Kinematics Learning of Gough-Stewart Platform

Zhu, Huizhi, Xu, Wenxia, Huang, Jian, Li, Jiaxin

arXiv.org Artificial IntelligenceFeb-14-2024

In this paper, we propose a graph neural network, DisGNet, for learning the graph distance matrix to address the forward kinematics problem of the Gough-Stewart platform. DisGNet employs the k-FWL algorithm for message-passing, providing high expressiveness with a small parameter count, making it suitable for practical deployment. Additionally, we introduce the GPU-friendly Newton-Raphson method, an efficient parallelized optimization method executed on the GPU to refine DisGNet's output poses, achieving ultra-high-precision pose. This novel two-stage approach delivers ultra-high precision output while meeting real-time requirements. Our results indicate that on our dataset, DisGNet can achieves error accuracys below 1mm and 1deg at 79.8\% and 98.2\%, respectively. As executed on a GPU, our two-stage method can ensure the requirement for real-time computation. Codes are released at https://github.com/FLAMEZZ5201/DisGNet.

disgnet, matrix, platform, (16 more...)

arXiv.org Artificial Intelligence

2402.09077

Country:

Asia > China > Hubei Province > Wuhan (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback